^ • ■ ■ - DOCOMENT SES0BE 

" ' - ■ ' . • ' ' ■. . . 

ED. 124 594 / . ^ ■ • TB OOS 353 

;A.UTK05 . • ^ Baker, Sva k*' * - • ' •' • 
TITLE Lean Data StEategies for Foimafive Evaluation, 

PUB- .DATS . [Apr 76:i • " " " ' 

NOTE lOp.;. .Paper presented • at" the Annual Heetin.g of the 

' American. Educational Research Association .(60th, San 
Francisco,- California, .^April 19-23 a 19 7^) - 

EDRS PRICE MF-$0.8.3 HC-$1 , 67' Plus Postage. 

DESCRIPTORS ' ♦Data Collection; *Eyaluation Methods; *Forinative 

Evaluation; -Instructional Iraprovelent 
, ■ f 

\\BS.T3ACT ' 



The intent *of formative .eva'luation is to improve 
programs -as jjell'as to justify, their continuation, Jt is critical to 
separate.clearly" tbpse functions' of the ' -avalu^tion which are- . • 
poli.fical fcoffl those --whi^h , ma y lead the way to instructional 
improvement. Data for formative evaluation should be gathered in an 
interpretable Vay at the level, at which decisions will ]}e , '. ' 
implemented, ustolly -at the classroom level* This forces xiai'a' 
collection activities to a single plaiyied incursion^. This 
recommendation may suggest rethinking, some of the, m,easuremen't- 
differences usua^Jly assumed to distinguish good classroom from good 
program evaluation. Purposive evaluati9n 'mandates asking questions 
about those -areas over which a program developer of iiftpiementer 
exerts sqme control. , Sometimes the specific implications 6f - ' 
alternative' data patterns are not Kept in' mind "during the design 'of 
evaluations; the evaluatpr should attempt -to foresee plausible- 
alternative patterns the data' might take. Bather tha/ literally apply 
the evaluation plans of others, evaluation designs shculd be allowed 
to conform- to theVspeoific questions needing answers, what is desired 
are evaluation activities that have instructional improvement as well 
•as, where necessary, politica-l utilities; where wit-h limited waste, 
data are provided, that are pertinent and obvious to program needs. 
<HC) . • ' . . , 



• ; ^ ; , '' . ■* . ■/ 

* Documents 'apqiiirod by ERIC .include many informa,!^ unpublished. * 
^ materials not available from' other sources,* ERIC makas every effort * 

* to obtain the best copy available* Nev;arthele6s, items of marginal * 

* reproaucibility are often encountered and this affects tTie quality 

* of the taicro"*fiche and hardcopy reproductions ERIC makes available * 

* via the ERIC Document- Reproduction Service (EDRS) . *EDRS is not • , * 

* r.Pspons,ible for the quality of the original document Reproductions * 

* supplied by EDRS ar.e the beit^that can be made from the original, 



• , LEAN DATA- STRATEGIES FOR FORMATI-VE EVALUATION 

<^ " ' ' Cva L, Baker 

CM , • / ^ ' . . ^ ' 

* * ^ 'Center for the Study of Evaluation 

. ' / ^ Universjt^^ .of Cali.fcirnia, Los^Angeles 



us O^PARfMENT OF H£AtTH 
COUCATtON A W£LFAR£ 
NATICNAt tN^lTUT£ OF 

EDUCATION \ 

•a: S". t f>''^ n „ 'f J* v^M OP N ONS 

MS' %A* SA, NS " T i "f C'' 



00, 

o 
o 



Paper presented'' at a Symposiujii at the 
Annual Meeting, 
'American Educational Research Association, 
' San Frantisco, Ca-lifornia, April, 1976. 



Lean data strategies for forkat-ive evaluation 

. Eva 1, "Baker 
Center for the Study of Evaluation 
'Un.iversHy of California, Los AngeTes 



/ • ' , . • 

At the out^e^, I v/isH to apologize for the title and in it', 

for the lise of. the-term "lean'.'. ' My. adaptation df this word for' 

application as a fonnq^lve evaluation precept derived from one of 

Sue Markle's rules for 'the preparation of good programmed instruc- 

tion. Programs were supposed to be'written "leanly", thdt'is,' in/ • 

.... .J^ ' 

mimma> versions, so t-fiat field tests-could provide information a- 

boul^^vHat lb add rather than require the more difficult inference 

about what to- delete. "Lean" now strikes ne a's"-a word more appro- 
» * • * » 

prjate^^for utterance by other scholars in, the Chicago, « specif ical ly, 

Oscar^Meyer and farmer John', and I pernfanently consign the use'of 

• ' . * • 

the , term to them and their products. 

Serond, the^conomic conrstraints implied in'the symposium ti-tle 
w6rds, jtigbtrnioney, suggest that^ the attribute oY'p^i^sirpony in'evalu- 
ation -strategy is sdmething forced" upon us, lik§ margarine or Swansori 

' * * . < 

^Frozen Dinners 'bqcaus^ we can't afford 'to do things better and more 
expensively. 'I'would propose tb tajce the position that liitiits on 
th^ scope of evaluatitjn are important for. reasons other than thrift 
or deprivation .and tl>at the activity of evaluation itself demand ' ' 
concision of. process., ^ 



Brief Histor y , - , . f> - * ' 

. Although the acti vities .now associated with evaluation had 
sludged. along for years in the educational v/orld„ they coalesced 



to foriTv-a ^fined field ,of 'inquiry only rel'ati-vely recently (and 
we are all fahiil ian with 'legislative and "bu.reaucrMic-Vessur.es " 
that nominally gave .eva)uatiQ.n- activitie,s their' boost into proniin- ' 

• er.ce). ,.Tf^ emergence -of .evaluatioa as something app-roach,ing a \j 
"•'field of.M'nquiry" can'be dat^ 1 i ttle. more than ten years ago. 
Around that time people b.egan to name themselves "evaluators", seek- 
ing an identifiable affiliation in^doing so. Graduate prog/atVis 
app-eared in, wh'ich evaluation was legitimized as an e/idfeavoPwor'thy 
of an advanced degr.ee. ' Scholars created sets of papers that attempted 
to define parameters of knowledge- appropriate for evaluation. Theae * 

• papers were shared in elitist, ceremonial rites periodically and 
also published to inform a growing constituency.. Fol lowers , or advocates 
of alternative models sprung up.- The Charisma Coe^ff icieTi.t , an fndi- ' 

'.cator of professional efficacy emerged, -in which the 'strength of 
followuig was directly related to the personal magnetism .of -the model- 
maker. Th^is , we were presented with a series of papers describing v. 

• * - * 

alternat.ive- models and their variations, such as CIPP,' Countenance , 
Disci^epancy, GoaT-free and.'on and on. - One confusing ne-ts in this 
period that has persisted -wa's the ambiguity regarding- what these models 
were for. Were they' presented to help us organize the way in which ' 
we thought about -eyaluation, or. was ,th^ir purpose to' control our 
actions and guide specific ways in. which 't-he evaluat'ior. was conducted? 

These models may Jbe regarded, then, a^ competitors for the hearts 
arid minds of the people, however unlikely that intent in their deyelop- 
■nient. tThe better models seemed to be more comprehensive and thbro'ugh. 



In fact; certain models" apparently, arrogated functions 'that. ^ftconn 

***** • ♦ •'«•'. 

passed al.l of edOcational activity. For exaqiple, evaluation- was-v * 
seen to be critical at the point where goals were articulated pro- • 
^grains were plsinfied, prograir.s we're' i-mpj emen ted 'and results obtS?i'.ned 
; and clearly, not only was the idea of eval uationM.mportant at each . 
, point,^. but' detailed p/oceduras to help evaluatofs' conduct tfjeir ' • 
•business were also developed. Again, 'the moVe comprehensive the 
procedures', the better. For instance," these 'days in order to conduct 
^^J_cred2bje. needs assess.nient, it is desirable to .sampla,widely am'ong ■ 
."V.^'eal and imagined constituencies, so that students, teachers, admini- 
» stra.tors, pajrents, community members .. ..al 1 are represented. The • 
■ needs assessn^nt would then seem to be Comprehensive, almo'st inde^en-' 
dent ^ of whether those sampled had a sensible reason for being included 
.at aVl. Because "limits to growth"\a^ not a popular idea during the 
prof1i-gate sixties, the values of thQrqughness and comprehensiveness 
were inqorporated into referent .works on evaluation and'rernairr intact' 
today. " " ^ ^ ^ * • ' ■ ^ \ 

Evalu^tiou endeavors have evolved fnto a new meta-model. We, 
have notv firmly developed Procedure-Referenced, Decis'ion-Free'evalu- • 
ation. As long ^as [procedures are carefully -fol lowed, and we circum- 
' spectly remember to inflict them on aVl kinds of par^icipants^ our 
evaluation seems to' meet the state-of-the-art, and stimulate' fami li ar 
laments about why nobody uses what we do. 

Kurt VonU'egut, ^in a book wi th an .unpronouncible title, discussed 
speech-giving in the same'way. No^one cares bp'remeriibers' anything 



you say. What was important wa-s that' you seem to give a s^peech. 

-'One of the contributing ■&xplana'tions'f^r the procedure orienta 

tion of bur evaluation, efforts .rel ates* to the..i ncreasi ng sense that 

evaluatKyn has primar.ily^a political" rather than a program improve- 

. ment purpose. Evaluations are "conaucted be.cause 'they^ are mandat^'ed,' 

because -we must justify previous decisions^, becausa^we need to show 

cause for increased funding. In the needs assessment example-, we • 

query multiple constitu'encisss partly because we are interested in 

•their perceptions but sometipies because, we don't want to be accused- 

o.f forgetting anyone. . l/e involve complex procedures in our evalua- 

•tions because they make us seeirj to be more credible -WhMle it is 

^difficult to a^e that pol itical fa'ctors are unimportant, they . 

•seem to confound our approach to evaluation so that, i^ those . 

factors were put a-side, we* might not know the information we need 

to make a rational set of decisions anyhow. * ' - ' » 

• , ft 

Purpose and Plasticit y' • ~ . '* 

. These remanks , are" directed , of course, not to 'the, givers of 
evaluation models, but to the users of them. And tfre focus of the- 
.res-t of the papier shall be on remedies in thp content of formative- 
evaluation, * . " ^ 

.With formative evaluation, our intent is to' improve programs 
_as well 'as to justify their continuation. Program improvement mast 
focus, at the classroom or learner level. Jhus,' if is critiGal in 
Tormative evaluation to separate clearly .those 'fynctions of the 
evaluation which are political from- those which may )ead the way tQ- 



_instruct>onal inlpfovement, 

* Speciffcally, what would Ihat/niean? Suppose one were •inter- 
ested in-eval uating '^nd. improving, a statewide educational 

s ' . 

prograiri. One could develop and implement measures that were 

interpretable at the, school district level, if the district were 

the main administrator pf program.. Such i/ifcrmation would.be he'lp- 

ful for the 5tate evaluator.s and interesting to the district people 

However, if the district wished to make decisions related to the ' 

.improvement of "school performance, school level information' would 

be necessary. Thus the distrjct might .develop measures "and a 

design to provide data for each' participating s-chool. Should a- " ' 

principal at a given- St hool wish information, he or she would 

^ ' ■ ' . • • . 

heed to collect data for each of the. classrooms in operation. : And 

finally, should a given teacher in a classroom in a school in a dis 
. trict in an 'enl igh'tened state wish infofijiation, he or she would 
obtain i t» f rom t^ie^students Ultimately, for a program to' produce 
improvement in learning, that teacher must maJ^e some good decfsions 
about specific childrefv. Our .mythology is such that we" believe 
that information about, how the children are leaJ^ninfl should help', 
in. that process. , ' ' , . " 

Formative evaluation questions at the classroom/learner level 
soem la be rarely 'as ke'd outside of -organized, institutionally-based 
research and development efforts. , Instead, each level of jnanage- 
ment, state, district, or school, seems to perpetrate ar^othcr ^sort • 
cTf evaluationC^ffort\ . * ; * 



Recommendation 'Incursive evaluation 
^ ^ . . _ , , ^ . 

In the h^m(5 of economy of time and animosity, data for fof-nia- 
tive evaTuatiQn shou-ld be gathered in an i nterpretab le way ajt tBe" 
level at which* decision^ -will be -implemented, usually, at the class> 
room level. .We need information about hoyf specific children" are 
•doipg, on important outcojpes. Such a desMre certainly does not " ^ 
preclude sampling ofperjons or items_ or inhibitthe manner of data 
reduction or aggregation, useful for revie\v at the subsequent admin: 

ist.rative levels. Rather its force- is to redqce "data collection. ' 

'. ■ ^ - • ^. 

activities to a sir\gle carefuTly planned- tncursiott. WhMle.the • 
* '<•••■ 

decisio/i purposes for the eyal^jation data migh^ be different for 

'*'•'■*' ■ 

different ^usevs, various requirements -fiiay be attended to by the . 

• manner of aggregation and riet)orting cath-er than,- as in many cases, 
the conduct of separate measurement activities. This * recommendation 

'my suggest rethinking some 0/ the measurement differences usually 
assumed to. distinguish good classroom from ^ood program , evaluation . 

Recommendation 2: Purplsive- evaluation ^ ' 

Purposive evaluation invokes a' simple mandate: ask questions 
about those areas over whi^h, as a program developer or inipl/jmenter , 
you exert some control" Because of the procedural or political 
orientation of muoh of what we do,* sometime^ the specific implica- 
tions^of alternative data patterns are not kept in mind during the ^ 
desicjn of evaluations. One of the best residuals of our research 



eride-ayors is the a'lternatiye hypothesis end-irirview. In research, 
- we try to'anticipate alternative hypotheses which can sensibly 
* explain obifefvations. In evaluation, it" seems that.the evaluator * 
should attempt to foresee plausible alternative patterns the data ■ 
(right take.- .If the evaluator cannot imagine the consequ^ces' of ' 
data configuratibns for "profjram imprO^cement , then he or she might 
• well cons ^de^^ whether certain' questions need to be a^ked at all. 
For ifistance^, i-f teac'her age is- found to.be negatively '^Vrelated - 
vvith program performance, but neither withholding "the pro'gram' 
from older te.achers nor rejuvenation is possible in a cost-effective 
' basisv then age-information, and 'many other demographic' facts need 
not be assembled. ' * ■•<'., 

^ ' The recommendations for i recursive. and purposive' evaluation 

designs suggest a retreat froni proc^dure-pniented evaluation. Rather 

than apply 1 iteral ]^ th^'evalua'tion plans- of fothers , our own 

evaluation design,s should be allowed to conform to 'the specific 

■ * ' qipstions we must have atiswered. ^'lasticity implies flexibility, ' 

rrioldability and id«iosyncracy. Our evaluations will not seem as 

credible, because they wil.l be based on l,ocal requ^ire.iients' rather ' 

than on .cosmcil^gicat-nnrrrods. -They wjll h^ve to be justified by 

utility Vathec^than authority.' This should not suggest that eval- 

uation's are procedure-free. They ma>'Sdapt wel T routi nized 

"lethods for deciding on questions of importance and for ob^'ning, 

analyzing, and reporting our data. But the purpbse^controls the 

J, ' ' 

■ procedures. 



^ERiC 



I 



Aside from cost savings derived from limifing measurements 
•to few occasions Snd for consequence-related questions, support 
for stringent evaluation strategy comes from, of all ^places, an 
interpretation the Second Law of Thermodynamics. Barry 
Commorrer, .in a recent articleN'n'the Ne w Yorker , discusses the 
energy^crisis with an analysis of * the First dr:u^ Second^ Law^ of 
Thermodynamics. The First Law wtiuld suggest that, given a set 
of.procedures , v/e attempt to implement them most efficiently. 
Thi^s, we might employ matrix sampling, and answer given questions 
>n a way tha^t saves both time and money. The Second L^iw, the 
law of entropy, suggests that every move we make contr.ibut'es to 

the eventual randomness in the UniveVse. -We- would do best then, 

/ 

to rethiol: our ^questions^and initiate the fewest activities ' 

required to provide answers, thereby cre-ating- the least disorder*. 

« 

What v/e wish to develop,' then, is virtual eval^jation: eval^u 
ation activiti^es that have in*struct*ional improvement as welf as, 
where necessary, political utilities; where in a conserving v/ay 
with 1 imi ted .waste, we provide data that are &o pertinent and 
obvious to program needs that the lament that no one cares about 
evaluation is forgotten from disuse. 

V 




